[% setvar title Variable-length lookbehind. %]
<div id="archive-notice">
    <h3>This file is part of the Perl 6 Archive</h3>
    <p>To see what is currently happening visit <a href="http://www.perl6.org/">http://www.perl6.org/</a></p>
</div>
<div class='pod'>
<a name='TITLE'></a><h1>TITLE</h1>
<p>Variable-length lookbehind.</p>
<a name='VERSION'></a><h1>VERSION</h1>
<pre>  Maintainer: Peter Heslin &lt;<a href='mailto:Peter.Heslin@ucd.ie'>Peter.Heslin@ucd.ie</a>&gt;
  Date: 9 Aug 2000
  Last Modified: 30 Sept 2000
  Mailing List: <a href='mailto:perl6-language-regex@perl.org'>perl6-language-regex@perl.org</a>
  Number: 72
  Version: 4
  Status: Developing</pre>
<a name='ABSTRACT'></a><h1>ABSTRACT</h1>
<p>In Perl6, lookbehind in regular expressions should be extended to permit
not only fixed-length, but also variable-length lookbehind.</p>
<a name='CHANGES'></a><h1>CHANGES</h1>
<p>Version 4 is a withdrawal of the originally proposed syntax in favour,
in part, of RFC 317; the present RFC has not been withdrawn, because the
proposal to implement variable-length lookbehind still stands.</p>
<p>Version 3 is a revision to make it clearer that the purpose of the
proposal is to implement a general scheme for variable-length
lookbehind.  The title has been changed to reflect this.</p>
<p>Version 2 of this RFC redirects discussion of this topic to
<a href='mailto:perl6-language-regex@perl.org.'>perl6-language-regex@perl.org.</a></p>
<a name='DESCRIPTION'></a><h1>DESCRIPTION</h1>
<p>The original proposal offered some new syntax as a portmanteau solution
to two very different problems: the lack of variable-length lookbehind
in Perl 5, and the lack of a method other than the monolithic and
inflexible <code>study</code> by which the programmer could pass hints to the
regexp optimizer, based upon her knowledge of the peculiarities of the
target string.  The latter problem has now been addressed much more
cleanly by Hugo van der Sanden in RFC 317, and so I have withdrawn the
originally proposed syntax, and all that remains for this RFC to discuss
is variable-length lookbehind.</p>
<a name='Syntax'></a><h2>Syntax</h2>
<p>I originally proposed new syntax designed to kill two birds with one
stone, but for variable-length lookbehind, the same syntax would serve
as that currently used for fixed-length: (?&lt;=pattern) and (?&lt;!pattern).</p>
<a name='Related Features'></a><h2>Related Features</h2>
<p>The original proposal suggested that it would be nice if <code>pos</code> in list
context were to return the offset into the the target string where
the match begins, as well as where it ends.  Some people seemed to like
this idea, so I am leaving it in, even though it no longer has much to
do with the content of this RFC.</p>
<a name='IMPLEMENTATION'></a><h1>IMPLEMENTATION</h1>
<p>Ilya Zakharevich says it is trivial ;-)  Here is part of his comment on
an earlier version of this RFC:</p>
<pre>================================================================= </pre>
<p>As I said it for many times: this is absolutely trivial to implement.
First of all, if you agree to rewrite</p>
<pre> (?&lt;= \w\s*\d )         # Semantic X: match &quot;a  1&quot;</pre>
<p>as</p>
<pre> (?&lt;= \d\s*\w )         # Semantic Y: match &quot;a  1&quot;</pre>
<p>then it is as simple as inserting go-back-by(1) nodes before each node
for \s \d and \w.</p>
<p>And to support the more intuitive ;-) semantic X, the only
more-or-less tricky part is to recursively go through the compile
tree, and put &quot;concatenated&quot; nodes in the opposite order.  A piece of
cake.</p>
<pre>================================================================= </pre>
<p>The original RFC proposed syntax that would have corresponded to Ilya's
syntax Y, precisely because I thought it would be easier to implement,
but several people objected to this as being obscure, so I changed it to
syntax X, which would mean that Perl and not the programmer would have
to reverse the regexp in order to start matching it in reverse from the
start of the current match.</p>
<p>One thing that should be avoided is to implement lookbehind by starting
at offset 0 in the target string and simply matching forwards from
there.  As Hugo has pointed out, this would throw away the work of the
optimizer and lead to quadratic behaviour.  This is why variable-length
lookbehind has to approach the match in reverse, starting from the
current offset of the start of match.</p>
<p>It would be acceptable if, for simplicity of implementation,
variable-length lookbehind were restricted to a subset of regular
expression syntax, since even that would be an improvement over what we
have now.</p>
<a name='REFERENCES'></a><h1>REFERENCES</h1>
<p>RFC 317: Access to optimisation information for regular expressions</p>
</div>
